Sample data

Table 1: Sample data
Data File(s) Format Source
“Nafot” nafot.shp (+7) Shapefile https://www.gov.il/he/Departments/Guides/info-gis
Railways RAIL_STRATEGIC.shp (+7) Shapefile https://data.gov.il/dataset/rail_strategic
Statistical areas statisticalareas_demography2018.gdb GDB https://www.cbs.gov.il/he/Pages/geo-layers.aspx

The data and code for this lecture can be downloaded from:

For more details on setting up the environment and sample data, see the preparation document.

R for Spatial Data Analysis

Software for analysis of spatial data

**QGIS**, an example of Graphical User Interface (GUI) software

Figure 1: QGIS, an example of Graphical User Interface (GUI) software

**R**, an example of Command Line Interface (CLI) software

Figure 2: R, an example of Command Line Interface (CLI) software

What is R?

R is a programming language and environment, originally developed for statistical computing and graphics. Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of ~16,000 packages in the official repository from various areas of interest.Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are ~185 packages specifically addressing spatial analysis in R, and many more are indirectly related to spatial data.

Books on Spatial Data Analysis with R

Figure 3: Books on Spatial Data Analysis with R

History of spatial analysis in R

Some important events in the history of spatial analysis support in R are summarized in Table 2.

Table 2: Significant events in the history of R-spatial
Year Event
pre-2003 Variable and incomplete approaches (MASS, spatstat, maptools, geoR, splancs, gstat, …)
2003 Consensus that a package defining standard data structures should be useful; rgdal released on CRAN
2005 sp released on CRAN; sp support in rgdal (Section ??
2008 Applied Spatial Data Analysis with R, 1st ed.
2010 raster released on CRAN (Section ??)
2011 rgeos released on CRAN
2013 Applied Spatial Data Analysis with R, 2nd ed.
2016 sf released on CRAN (Section ??)
2018 stars released on CRAN (Section ??)
2019 Geocomputation with R (https://geocompr.robinlovelace.net/)
2021(?) Spatial Data Science (https://www.r-spatial.org/book/)

R as a GIS?

The question that arises here is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with graphical user interfaces such as ArcGIS or QGIS.

General advantages of Command Line Interface (CLI) software include:

  • Automation—Doing otherwise unfeasible repetitive tasks
  • Reproducibility—Precise control of instructions to the computer

Moreover, specific strengths of R as a GIS are:

  • R capabilities in data processing and visualization, combined with dedicated packages for spatial data
  • A single environment encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.

Nevertheless, there are situations when other tools are needed:

  • Interactive editing or georeferencing (but see mapedit package)
  • Unique GIS algorithms (3D analysis, label placement, splitting lines at intersections)
  • Data that cannot fit in RAM (but R can connect to spatial databases1 and other softwere for working with big data)

The following sections (????) highlight some of the capabilities of spatial data analysis packages in R, through short examples.

Input and output of spatial data

Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:

  • GDAL/OGR is used for reading/writing vector and raster files, with sf and stars
  • PROJ is used for handling CRS, in both sf and stars
  • Working with specialized formats, e.g., HDF with gdalUtils or NetCDF with ncdf4

sf: Geoprocessing Vector Layers

GEOS is used for geometric operations on vector layers with sf:

  • Numeric operators—Area, Length, Distance…
  • Logical operators—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…
  • Geometry generating operators—Centroid, Buffer, Intersection, Union, Difference, Convex-Hull, Simplification…
Buffer function

Figure 4: Buffer function

stars: Geoprocessing Rasters

Geometric operations on rasters can be done with package raster:

  • Accessing cell values—As vector, As matrix, Extract to points / lines / polygons, random / regular sampling, Frequency table, Histogram…
  • Raster algebra—Arithmetic (+, -, …), Math (sqrt, log10, …), logical (!, ==, >, …), summary (mean, max, …), Mask, Overlay…
  • Changing resolution and extent—Crop, Mosaic, (Dis)aggregation, Reprojection, Resampling, Shift, Rotation…
  • Focal operators—Distance, Direction, Focal Filter, Slope, Aspect, Flow direction…
  • Transformations—Vector layers <-> Raster…

gstat: Geostatistical Modelling

Univariate and multivariate geostatistics:

  • Variogram modelling
  • Ordinary and universal point or block (co)kriging
  • Cross-validation
Predicted Zinc concentration, using Ordinary Kriging

Figure 5: Predicted Zinc concentration, using Ordinary Kriging

spdep: Spatial dependence models

Modelling with spatial weights:

  • Building neighbor lists and spatial weights
  • Tests for spatial autocorrelation for areal data (e.g. Moran’s I)
  • Spatial regression models (e.g. SAR, CAR)
Neighbors list based on regions with contiguous boundaries

Figure 6: Neighbors list based on regions with contiguous boundaries

spatstat: Spatial point patterns

Techniques for statistical analysis of spatial point patterns, such as:

  • Kernel density estimation
  • Detection of clustering using Ripley’s K-function
  • Spatial logistic regression
Distance map for the Biological Cells point pattern dataset

Figure 7: Distance map for the Biological Cells point pattern dataset

RPostgreSQL: Working with PostGIS

Package sf combined with RPostgreSQL can be used to read from, and write to, a PostGIS spatial database:

# library(sf)
# library(RPostgreSQL)

# con = dbConnect(
#   PostgreSQL(),
#   dbname = "gisdb",
#   host = "159.89.13.241",
#   port = 5432,
#   user = "geobgu",
#   password = "*******"
# )
# q = "SELECT name_lat, geometry FROM plants LIMIT 3;"
# st_read(con, query = q)

Other examples

  • Network routing: stplanr, dodgr, sfnetworks, igraph
  • Simplification rmapshaper
  • Access to OSM data: osmdata, osmextract
  • Static mapping: ggplot2, tmap, rasterVis
  • Web mapping: leaflet, mapview, deckgl
  • Mapping APIs: mapsapi
  • Spherical geometry: geosphere, s2, lwgeom

Spatial data structures

What is a vector layer?

Geometry (left) and attributes (right) of vector layers (https://www.neonscience.org/dc-shapefile-attributes-r)

Figure 8: Geometry (left) and attributes (right) of vector layers (https://www.neonscience.org/dc-shapefile-attributes-r)

Geometry types

Simple Feature types (see also: https://r-spatial.github.io/sf/articles/sf1.html)

Figure 9: Simple Feature types (see also: https://r-spatial.github.io/sf/articles/sf1.html)

File formats

Table 3: Common vector layer file formats
Type Format File extension
Binary ESRI Shapefile .shp, .shx, .dbf, .prj, …
GeoPackage (GPKG) .gpkg
Plain Text GeoJSON .json or .geojson
GPS Exchange Format (GPX) .gpx
Keyhole Markup Language (KML) .kml
Spatial Databases PostGIS / PostgreSQL

sf data structures

Package sf defines a hierarchical class system with three classes (Table ??):

  • Class sfg is for a single geometry
  • Class sfc is a set of geometries with a CRS
  • Class sf is a layer with attributes
Table 4: Spatial data structures in package sf
class hierarchy data
sfg geometry coords, type, dimension
sfc geometry column set of sfg + CRS
sf layer sfc + attributes

sf is a relative new (2016-) R package for handling vector layers in R. In the long-term, sf aims to replace rgdal (2003-), sp (2005-), and rgeos (2011-).

The main innovation in sf is a complete implementation of the Simple Features standard. Since 2003, Simple Features been widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT). A subset of simple features forms the GeoJSON standard.

It is important to note that the sf package depends on several external software components (installed along with the R package), most importantly GDAL, GEOS and PROJ (Figure 10). These well-tested and popular open-source components are common to numerous open-source and commercial software for spatial analysis, such as QGIS and PostGIS.

`sf` package dependencies^[https://github.com/edzer/rstudio_conf]

Figure 10: sf package dependencies2

The sf class extends the data.frame class to include a geometry column. This is similar to the way that spatial databases are structured. For example, the sample dataset shown in Figure 11 represents a polygonal layer with three features and six non-spatial attributes. The attributes refer to demographic and epidemiological attributes of US counties, such as the number of births in 1974 (BIR74), the number of sudden infant death cases in 1974 (SID74), and so on. The seventh column is the geometry column, containing the polygon geometries.

Structure of an `sf` object^[https://cran.r-project.org/web/packages/sf/vignettes/sf1.html]

Figure 11: Structure of an sf object3

Figure 12 shows what layer in Figure 11 would look like when mapped. We can see the outline of the three polygons, as well as the values of all six non-spatial attributes (in separate panels).

Visualization of the `sf` object shown in Figure \@ref(fig:nc-geometry-column)

Figure 12: Visualization of the sf object shown in Figure 11

Reading vector data

Function st_read can be used to read vector layers into sf data structures:

library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.3, PROJ 7.1.1
nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N

Printing the object gives a summary of its properties, and the values of the (first 10) features:

nafot
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     name_eng                       geometry
## 1      Zefat POLYGON ((739780.1 3686007,...
## 2  Jerusalem POLYGON ((672273.8 3518394,...
## 3   Kinneret POLYGON ((745560 3649860, 7...
## 4   Yizre'el POLYGON ((702283.1 3628046,...
## 5       Akko POLYGON ((702725.9 3630513,...
## 6      Golan POLYGON ((759304.4 3691202,...
## 7      Haifa POLYGON ((701391.6 3631170,...
## 8     Hadera POLYGON ((706537.1 3602188,...
## 9     Sharon POLYGON ((692687.3 3583974,...
## 10     Ramla POLYGON ((672841.5 3544808,...

As mentioned above, a layer (geometry+attributes) is represented by an sf object:

class(nafot)
## [1] "sf"         "data.frame"

If we want just the geometric part, it can be extracted with st_geometry, resulting in an object of class sfg (geometry column):

st_geometry(nafot)
## Geometry set for 15 features 
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 5 geometries:
## POLYGON ((739780.1 3686007, 739786.6 3686007, 7...
## POLYGON ((672273.8 3518394, 672368.4 3518511, 6...
## POLYGON ((745560 3649860, 745585.8 3649853, 745...
## POLYGON ((702283.1 3628046, 702283.5 3628046, 7...
## POLYGON ((702725.9 3630513, 702724.1 3630510, 7...

Conversely, If we want just the non-geometric part, it can be extracted with st_drop_geometry, resulting in a data.frame:

st_drop_geometry(nafot)
##       name_eng
## 1        Zefat
## 2    Jerusalem
## 3     Kinneret
## 4     Yizre'el
## 5         Akko
## 6        Golan
## 7        Haifa
## 8       Hadera
## 9       Sharon
## 10       Ramla
## 11     Rehovot
## 12    Ashqelon
## 13 Be'er Sheva
## 14 Petah Tiqwa
## 15    Tel Aviv

The plot function is a quick way to see the spatial arrangment and attribute values in an sf layer. For example:

plot(nafot, key.width = lcm(4))
The `nafot` layer

Figure 13: The nafot layer

Coordinate Reference Systems (CRS)

A Coordinate Reference System (CRS) defines how the coordinates in the data relate to the surface of the Earth. There are two main types of CRS (Figure 14):

  • Geographic—longitude and latitude, in degrees
  • Projected—implying flat surface, w/ units (e.g. meters)
US counties in WGS84 and US Atlas projections

Figure 14: US counties in WGS84 and US Atlas projections

A vector layer can be reprojected with st_transform. The st_transform function has two parameters:

  • x—The layer to be reprojected
  • crs—The target CRS

Where a CRS can be specified using:

  • An EPSG code (e.g., 4326)
  • A PROJ4 string (e.g., "+proj=longlat +datum=WGS84 +no_defs")
  • A WKT string

CRS of nafot vector layer:

st_crs(nafot)
## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 36N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 36N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 36N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",33,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32636]]
nafot_wgs84 = st_transform(nafot, 4326)
nafot_wgs84
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 34.26679 ymin: 29.48737 xmax: 35.89468 ymax: 33.33479
## geographic CRS: WGS 84
## First 10 features:
##     name_eng                       geometry
## 1      Zefat POLYGON ((35.57481 33.2865,...
## 2  Jerusalem POLYGON ((34.81956 31.78814...
## 3   Kinneret POLYGON ((35.6271 32.95949,...
## 4   Yizre'el POLYGON ((35.15965 32.77173...
## 5       Akko POLYGON ((35.16491 32.79389...
## 6      Golan POLYGON ((35.78575 33.32878...
## 7      Haifa POLYGON ((35.15081 32.80006...
## 8     Hadera POLYGON ((35.19932 32.53785...
## 9     Sharon POLYGON ((35.0482 32.37614,...
## 10     Ramla POLYGON ((34.83026 32.02623...
plot(st_geometry(nafot), main = "UTM", axes = TRUE)
plot(st_geometry(nafot_wgs84), main = "WGS84", axes = TRUE)
Nafot in UTM and WGS84 coordinate reference systemsNafot in UTM and WGS84 coordinate reference systems

Figure 15: Nafot in UTM and WGS84 coordinate reference systems

Geoprocessing functions

Reading layers into R

In the following examples, we will use two vector layers:

  • nafot—“Nafa” administrative areas in Israel
  • rail—Railways in Israel

First of all, we read the layers into R, using st_read:

nafot = st_read("data/nafot.shp")
## Reading layer `nafot' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/nafot.shp' using driver `ESRI Shapefile'
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
rail = st_read("data/RAIL_STRATEGIC.shp")
## Reading layer `RAIL_STRATEGIC' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/RAIL_STRATEGIC.shp' using driver `ESRI Shapefile'
## Simple feature collection with 217 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS:  Israel 1993 / Israeli TM Grid
nafot
## Simple feature collection with 15 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     name_eng                       geometry
## 1      Zefat POLYGON ((739780.1 3686007,...
## 2  Jerusalem POLYGON ((672273.8 3518394,...
## 3   Kinneret POLYGON ((745560 3649860, 7...
## 4   Yizre'el POLYGON ((702283.1 3628046,...
## 5       Akko POLYGON ((702725.9 3630513,...
## 6      Golan POLYGON ((759304.4 3691202,...
## 7      Haifa POLYGON ((701391.6 3631170,...
## 8     Hadera POLYGON ((706537.1 3602188,...
## 9     Sharon POLYGON ((692687.3 3583974,...
## 10     Ramla POLYGON ((672841.5 3544808,...
rail
## Simple feature collection with 217 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 157822.2 ymin: 385552.2 xmax: 255113.7 ymax: 790663.5
## projected CRS:  Israel 1993 / Israeli TM Grid
## First 10 features:
##       UPGRADE SHAPE_LENG SHAPE_Le_1               SEGMENT ISACTIVE
## 1  שדרוג 2040  14489.401  12379.715     כפר יהושע - נשר_1     פעיל
## 2  שדרוג 2030   2159.344   2274.112    באר יעקב-ראשונים_2     פעיל
## 3  שדרוג 2040   4942.306   4942.306          נתבג - נען_3  לא פעיל
## 4  שדרוג 2040   2829.892   2793.338 לב המפרץ מזרח - נשר_4     פעיל
## 5  שדרוג 2040   1976.491   1960.171     קרית גת - להבים_5     פעיל
## 6  שדרוג 2040   1195.701   1195.701    רמלה - רמלה מזרח_6     פעיל
## 7  שדרוג 2030   4224.799   4224.799   אור עקיבא - עתלית_7  לא פעיל
## 8    חדש 2040  36452.633  36452.633      מרכז ספיר-פארן_8  לא פעיל
## 9  שדרוג 2040   7822.722   7892.334       נען - בית שמש_9     פעיל
## 10 שדרוג 2030   4610.831   4176.157       נתבג - ההגנה_10     פעיל
##                   UPGRAD                       geometry
## 1  שדרוג בין 2030 ל-2040 LINESTRING (205530.1 741563...
## 2          שדרוג עד 2030 LINESTRING (181507.6 650706...
## 3  שדרוג בין 2030 ל-2040 LINESTRING (189180.6 645433...
## 4  שדרוג בין 2030 ל-2040 LINESTRING (203482.8 744181...
## 5  שדרוג בין 2030 ל-2040 LINESTRING (178574.1 609392...
## 6  שדרוג בין 2030 ל-2040 LINESTRING (189266.6 647211...
## 7          שדרוג עד 2030 LINESTRING (193268.9 719774...
## 8  פתיחה בין 2030 ל-2040 LINESTRING (219966 509396.8...
## 9  שדרוג בין 2030 ל-2040 LINESTRING (188081.3 642530...
## 10         שדרוג עד 2030 LINESTRING (184248 657573.3...

Reprojection

For any type of spatial analysis, we usually need all input layers to be in the same CTS. For that purpose, we will reproject the rail layer to the CRS of the nafot layer:

rail = st_transform(rail, st_crs(nafot))

Basic plotting

We can plot each layer on its own, as shown above, to examine its attributes:

plot(nafot, key.width = lcm(4))
The `nafot` layer

Figure 16: The nafot layer

plot(rail)
The `rail` layer

Figure 17: The rail layer

We can also plot the two layer geometries together, to examine their arrangement (Figure 18):

plot(st_geometry(nafot), border = "grey")
plot(st_geometry(rail), add = TRUE)
The `nafot` and `rail` geometries

Figure 18: The nafot and rail geometries

Subsetting

Non-spatial

Subsetting (filtering) of features in an sf vector layer is done in exactly the same way as filtering rows in a data.frame. For example, the following expression filters the rail layer to keep only those railway lines which are active:

rail = rail[rail$ISACTIVE == "פעיל", ]
rail
## Simple feature collection with 161 features and 6 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       UPGRADE SHAPE_LENG SHAPE_Le_1               SEGMENT ISACTIVE
## 1  שדרוג 2040  14489.401  12379.715     כפר יהושע - נשר_1     פעיל
## 2  שדרוג 2030   2159.344   2274.112    באר יעקב-ראשונים_2     פעיל
## 4  שדרוג 2040   2829.892   2793.338 לב המפרץ מזרח - נשר_4     פעיל
## 5  שדרוג 2040   1976.491   1960.171     קרית גת - להבים_5     פעיל
## 6  שדרוג 2040   1195.701   1195.701    רמלה - רמלה מזרח_6     פעיל
## 9  שדרוג 2040   7822.722   7892.334       נען - בית שמש_9     פעיל
## 10 שדרוג 2030   4610.831   4176.157       נתבג - ההגנה_10     פעיל
## 11 שדרוג 2040   1426.854   1426.854   סוקולוב - נורדאו_11     פעיל
## 13 שדרוג 2030   8758.880   8758.880        נהריה - עכו_13     פעיל
## 14 שדרוג 2030   5102.569   5102.569    חולות-יבנה מערב_14     פעיל
##                   UPGRAD                       geometry
## 1  שדרוג בין 2030 ל-2040 LINESTRING (692568.6 362751...
## 2          שדרוג עד 2030 LINESTRING (670422.8 353618...
## 4  שדרוג בין 2030 ל-2040 LINESTRING (690467.1 363008...
## 5  שדרוג בין 2030 ל-2040 LINESTRING (668326.9 349482...
## 6  שדרוג בין 2030 ל-2040 LINESTRING (678250.9 353284...
## 9  שדרוג בין 2030 ל-2040 LINESTRING (677161.1 352814...
## 10         שדרוג עד 2030 LINESTRING (673022.5 354310...
## 11 שדרוג בין 2030 ל-2040 LINESTRING (679299.3 356088...
## 13         שדרוג עד 2030 LINESTRING (696063 3653684,...
## 14         שדרוג עד 2030 LINESTRING (664704.1 353470...

We can also subset the columns (attributes) we need. For example, in the following expressions we create an ID variable segment_id, and keep only that variable:

rail$segment_id = 1:nrow(rail)
rail = rail["segment_id"]
rail
## Simple feature collection with 161 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##    segment_id                       geometry
## 1           1 LINESTRING (692568.6 362751...
## 2           2 LINESTRING (670422.8 353618...
## 4           3 LINESTRING (690467.1 363008...
## 5           4 LINESTRING (668326.9 349482...
## 6           5 LINESTRING (678250.9 353284...
## 9           6 LINESTRING (677161.1 352814...
## 10          7 LINESTRING (673022.5 354310...
## 11          8 LINESTRING (679299.3 356088...
## 13          9 LINESTRING (696063 3653684,...
## 14         10 LINESTRING (664704.1 353470...

Spatial

We can also subset feature according to intersection with another layer, using the latter as an index. For example, the following expression creates a subset of nafot, named nafot1, with only those features intersecting the rail layer:

nafot1 = nafot[rail, ]
nafot1
## Simple feature collection with 12 features and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 742395.3 ymax: 3665564
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng                       geometry
## 2    Jerusalem POLYGON ((672273.8 3518394,...
## 4     Yizre'el POLYGON ((702283.1 3628046,...
## 5         Akko POLYGON ((702725.9 3630513,...
## 7        Haifa POLYGON ((701391.6 3631170,...
## 8       Hadera POLYGON ((706537.1 3602188,...
## 9       Sharon POLYGON ((692687.3 3583974,...
## 10       Ramla POLYGON ((672841.5 3544808,...
## 11     Rehovot POLYGON ((660883 3525603, 6...
## 12    Ashqelon POLYGON ((660883 3525603, 6...
## 13 Be'er Sheva POLYGON ((639165.2 3481650,...

Figure 19 shows the nafot subset (in grey fill) and the railway lines layer.

plot(st_geometry(nafot), border = "grey50")
plot(st_geometry(nafot1), border = "grey50", col = "grey90", add = TRUE)
plot(st_geometry(rail), add = TRUE)
The `nafot` and `rail` geometries

Figure 19: The nafot and rail geometries

Geometric calculations

Geometric operations on vector layers can conceptually be divided into three groups according to their output:

  • Numeric values (Section ??)—Functions that summarize geometrical properties of:
    • A single layer—e.g., area, length (Section ??)
    • A pair of layers—e.g., distance (Section ??)
  • Logical values (Section ??)—Functions that evaluate whether a certain condition holds true, regarding:
    • A single layer—e.g., geometry is valid
    • A pair of layers—e.g., feature A intersects feature B (Section ??)
  • Spatial layers (Section ??)—Functions that create a new layer based on:
    • A single layer—e.g., centroid, buffer (Sections ??, ??, ??, ??)
    • A pair of layers—e.g., intersection area (Section ??)

Numeric

There are several functions to calculate numeric geometric properties of vector layers in package sf:

  • st_length
  • st_area
  • st_distance
  • st_bbox
  • st_dimension

For example, we can calculate the area of each feature in the nafot layer (i.e. each state) using st_area:

nafot$area = st_area(nafot)
nafot$area[1:3]
## Units: [m^2]
## [1] 660340513 653746831 639422163

The result is an object of class units:

class(nafot$area)
## [1] "units"

We can convert measurements to different units with set_units from package units:

library(units)
## udunits system database from /usr/share/xml/udunits
nafot$area = set_units(nafot$area, "km^2")
nafot$area[1:3]
## Units: [km^2]
## [1] 660.3405 653.7468 639.4222

Inspecting the result:

plot(nafot[, "area"])
Calculated `area` attribute

Figure 20: Calculated area attribute

Logical

Given two layers, x and y, the following logical geometric functions check whether each feature in x maintains the specified relation with each feature in y:

  • st_intersects
  • st_disjoint
  • st_touches
  • st_crosses
  • st_within
  • st_contains
  • st_overlaps
  • st_covers
  • st_covered_by
  • st_equals
  • st_equals_exact

When specifying sparse=FALSE the functions return a logical matrix. Each element i,j in the matrix is TRUE when f(x[i], y[j]) is TRUE. For example, this creates a matrix of intersection relations between nafot:

int = st_intersects(nafot1, nafot1, sparse = FALSE)
int
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
##  [2,] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [3,] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [4,] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [5,] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [6,] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE
##  [7,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
##  [8,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
##  [9,]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE
## [12,] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE

The following code section visualizes the matrix:

int1 = apply(int, 2, rev)
int1 = t(int1)
image(int1, col = c("lightgrey", "red"), asp = 1, axes = FALSE)
axis(3, at = seq(0, 1, 1/(nrow(int1)-1)), labels = nafot1$name_eng, las = 2, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
axis(2, at = seq(0, 1, 1/(nrow(int1)-1)), labels = rev(nafot1$name_eng), las = 1, pos = -0.046, lwd = 0, lwd.ticks = 1, cex.axis = 0.75)
Intersection relations between `nafot` features

Figure 21: Intersection relations between nafot features

Spatial

sf provides common geometry-generating functions applicable to individual geometries, such as:

  • st_centroid
  • st_buffer
  • st_sample
  • st_convex_hull
  • st_voronoi
Geometry-generating operations on individual layers

Figure 22: Geometry-generating operations on individual layers

For example, the following expression uses st_centroid to create a layer of “Nafa” centroids:

nafot_ctr = st_centroid(nafot)

They can be plotted as follows, the result is shown in Figure 23:

plot(st_geometry(nafot), border = "grey")
plot(st_geometry(nafot_ctr), col = "red", pch = 3, add = TRUE)
State centroids

Figure 23: State centroids

Other geometry-generating functions work on pairs of input geometries (Figure 24):

  • st_intersection
  • st_difference
  • st_sym_difference
  • st_union
Geometry-generating operations on pairs of layers

Figure 24: Geometry-generating operations on pairs of layers

Example 1: Rail density

Splitting lines by polygons

For example, to calculate total rail length per “Nafa” we can use st_intersection to ‘split’ the rail layer by “Nafa”:

rail_int = st_intersection(rail, nafot)
rail_int
## Simple feature collection with 182 features and 3 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     segment_id  name_eng             area                       geometry
## 9            6 Jerusalem  653.7468 [km^2] LINESTRING (677169.6 352073...
## 22          16 Jerusalem  653.7468 [km^2] LINESTRING (677218.8 352065...
## 34          25 Jerusalem  653.7468 [km^2] LINESTRING (673559.5 351793...
## 106         88 Jerusalem  653.7468 [km^2] LINESTRING (688368.9 351530...
## 195        146 Jerusalem  653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196        147 Jerusalem  653.7468 [km^2] LINESTRING (706324.4 351421...
## 1            1  Yizre'el 1231.8174 [km^2] LINESTRING (697854.1 361850...
## 55          44  Yizre'el 1231.8174 [km^2] LINESTRING (715373.6 361169...
## 66          55  Yizre'el 1231.8174 [km^2] LINESTRING (699306 3617893,...
## 153        121  Yizre'el 1231.8174 [km^2] LINESTRING (707133.8 361436...

The result is a new line layer split by “Nafa” borders and including the name_eng attribute:

plot(rail_int[, "name_eng"], lwd = 3, key.width = lcm(4), reset = FALSE)
plot(st_geometry(nafot), border = "lightgrey", add = TRUE)
Intersection result

Figure 25: Intersection result

Line length

The resulting layer has mixed LINESTERING and MULTILINESTRING geometries (Why?)

class(rail_int$geometry)
## [1] "sfc_GEOMETRY" "sfc"

To calculate line length, we need to convert it to MULTILINESTRING:

rail_int = st_cast(rail_int, "MULTILINESTRING")
rail_int
## Simple feature collection with 182 features and 3 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     segment_id  name_eng             area                       geometry
## 9            6 Jerusalem  653.7468 [km^2] MULTILINESTRING ((677169.6 ...
## 22          16 Jerusalem  653.7468 [km^2] MULTILINESTRING ((677218.8 ...
## 34          25 Jerusalem  653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 106         88 Jerusalem  653.7468 [km^2] MULTILINESTRING ((688368.9 ...
## 195        146 Jerusalem  653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196        147 Jerusalem  653.7468 [km^2] MULTILINESTRING ((706324.4 ...
## 1            1  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 55          44  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((715373.6 ...
## 66          55  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((699306 36...
## 153        121  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((707133.8 ...

Let’s add a railway length attribute called length:

rail_int$length = st_length(rail_int)
rail_int$length = set_units(rail_int$length, km)
rail_int
## Simple feature collection with 182 features and 4 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 647910.3 ymin: 3427214 xmax: 734378.9 ymax: 3653684
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##     segment_id  name_eng             area                       geometry
## 9            6 Jerusalem  653.7468 [km^2] MULTILINESTRING ((677169.6 ...
## 22          16 Jerusalem  653.7468 [km^2] MULTILINESTRING ((677218.8 ...
## 34          25 Jerusalem  653.7468 [km^2] MULTILINESTRING ((673559.5 ...
## 106         88 Jerusalem  653.7468 [km^2] MULTILINESTRING ((688368.9 ...
## 195        146 Jerusalem  653.7468 [km^2] MULTILINESTRING ((691812.2 ...
## 196        147 Jerusalem  653.7468 [km^2] MULTILINESTRING ((706324.4 ...
## 1            1  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((697854.1 ...
## 55          44  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((715373.6 ...
## 66          55  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((699306 36...
## 153        121  Yizre'el 1231.8174 [km^2] MULTILINESTRING ((707133.8 ...
##               length
## 9    0.09790983 [km]
## 22  13.29636317 [km]
## 34   1.25629623 [km]
## 106 30.09022166 [km]
## 195 14.31484517 [km]
## 196  1.18204492 [km]
## 1    1.57447932 [km]
## 55  15.12395169 [km]
## 66   8.95013104 [km]
## 153  8.96550993 [km]

Join layer with table

Next we aggregate attribute table of rail_int by state, to find the sum of length values:

rail_int = st_drop_geometry(rail_int) 
rail_int = aggregate(rail_int["length"], rail_int["name_eng"], sum)

The result is a data.frame with total length of railway tracks, per “Nafa”:

head(rail_int)
##      name_eng         length
## 1        Akko  36.19126 [km]
## 2    Ashqelon  76.78070 [km]
## 3 Be'er Sheva 119.74786 [km]
## 4      Hadera  46.31090 [km]
## 5       Haifa  41.60683 [km]
## 6   Jerusalem  60.23768 [km]

Next, we can join the aggregated table back to the nafot layer:

nafot = merge(nafot, rail_int, by = "name_eng", all.x = TRUE)

Here is how the nafot layer looks like after the join:

nafot
## Simple feature collection with 15 features and 3 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng              area         length                       geometry
## 1         Akko   955.6361 [km^2]  36.19126 [km] POLYGON ((702725.9 3630513,...
## 2     Ashqelon  1270.6010 [km^2]  76.78070 [km] POLYGON ((660883 3525603, 6...
## 3  Be'er Sheva 13182.9382 [km^2] 119.74786 [km] POLYGON ((639165.2 3481650,...
## 4        Golan  1153.9042 [km^2]        NA [km] POLYGON ((759304.4 3691202,...
## 5       Hadera   577.3104 [km^2]  46.31090 [km] POLYGON ((706537.1 3602188,...
## 6        Haifa   299.1068 [km^2]  41.60683 [km] POLYGON ((701391.6 3631170,...
## 7    Jerusalem   653.7468 [km^2]  60.23768 [km] POLYGON ((672273.8 3518394,...
## 8     Kinneret   639.4222 [km^2]        NA [km] POLYGON ((745560 3649860, 7...
## 9  Petah Tiqwa   283.1575 [km^2]  35.57314 [km] POLYGON ((674110.3 3549169,...
## 10       Ramla   339.2468 [km^2]  82.70228 [km] POLYGON ((672841.5 3544808,...

Length of NA implies there are no railways in the polygon. These NA values should therefore be replaced with zero:

nafot$length[is.na(nafot$length)] = 0
plot(nafot[, "length"])
Total railway length per Nafa

Figure 26: Total railway length per Nafa

Calculating density

Finally, we divide total railway length by state area. This gives us railway density per state:

nafot$density = nafot$length / nafot$area

Sorted:

nafot = nafot[order(nafot$density, decreasing = TRUE), ]
nafot
## Simple feature collection with 15 features and 4 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 620662.1 ymin: 3263494 xmax: 770624.4 ymax: 3691834
## projected CRS:  WGS 84 / UTM zone 36N
## First 10 features:
##       name_eng             area        length                       geometry
## 10       Ramla  339.2468 [km^2] 82.70228 [km] POLYGON ((672841.5 3544808,...
## 13    Tel Aviv  175.6073 [km^2] 33.82369 [km] POLYGON ((674110.3 3549169,...
## 11     Rehovot  323.6068 [km^2] 47.65151 [km] POLYGON ((660883 3525603, 6...
## 6        Haifa  299.1068 [km^2] 41.60683 [km] POLYGON ((701391.6 3631170,...
## 9  Petah Tiqwa  283.1575 [km^2] 35.57314 [km] POLYGON ((674110.3 3549169,...
## 7    Jerusalem  653.7468 [km^2] 60.23768 [km] POLYGON ((672273.8 3518394,...
## 5       Hadera  577.3104 [km^2] 46.31090 [km] POLYGON ((706537.1 3602188,...
## 12      Sharon  348.4451 [km^2] 24.78527 [km] POLYGON ((692687.3 3583974,...
## 2     Ashqelon 1270.6010 [km^2] 76.78070 [km] POLYGON ((660883 3525603, 6...
## 1         Akko  955.6361 [km^2] 36.19126 [km] POLYGON ((702725.9 3630513,...
##              density
## 10 0.24378209 [1/km]
## 13 0.19260989 [1/km]
## 11 0.14725126 [1/km]
## 6  0.13910357 [1/km]
## 9  0.12563022 [1/km]
## 7  0.09214221 [1/km]
## 5  0.08021836 [1/km]
## 12 0.07113107 [1/km]
## 2  0.06042865 [1/km]
## 1  0.03787138 [1/km]

Plotting the layer shows the new density attribute:

plot(nafot[, "density"])
railway density per state

Figure 27: railway density per state

plot(nafot)
Nafot layer with calculated attributes

Figure 28: Nafot layer with calculated attributes

Example 2: Population patterns

Reading statistical areas

In the second example, we are going to examine the dissimilarity between “Nafot” in terms of their age structure. The demographic data come at the statistical area level, which we are going to aggregate to the “Nafa” level. First, we read the statistical areas Shapefile:

stat = st_read("data/statisticalareas_demography2018.gdb")
## Reading layer `statisticalareas_demography2018' from data source `/home/michael/Sync/Presentations/p_2021_01_LMS_Intro_to_Spatial_R/data/statisticalareas_demography2018.gdb' using driver `OpenFileGDB'
## Simple feature collection with 3194 features and 34 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 130175.2 ymin: 378139.7 xmax: 284068.5 ymax: 804530.3
## projected CRS:  Israel 1993 / Israeli TM Grid

Subsetting

The layer has numerous columns:

vars = colnames(stat)
vars
##  [1] "SEMEL_YISHUV"         "STAT11"               "YISHUV_STAT11"       
##  [4] "SHEM_YISHUV"          "SHEM_YISHUV_ENGLISH"  "Stat11_Unite"        
##  [7] "Stat11_Ref"           "Main_Function_Code"   "Main_Function_Txt"   
## [10] "Religion_yishuv_Code" "Religion_yishuv_Txt"  "Religion_Stat_Code"  
## [13] "Religion_Stat_Txt"    "Pop_Total"            "Male_Total"          
## [16] "Female_Total"         "age_0_4"              "age_5_9"             
## [19] "age_10_14"            "age_15_19"            "age_20_24"           
## [22] "age_25_29"            "age_30_34"            "age_35_39"           
## [25] "age_40_44"            "age_45_49"            "age_50_54"           
## [28] "age_55_59"            "age_60_64"            "age_65_69"           
## [31] "age_70_74"            "age_75_up"            "SHAPE_Length"        
## [34] "SHAPE_Area"           "SHAPE"

but, in this case, we are only interested in the population estimates per age group:

vars = vars[grepl("age_", vars)]
vars
##  [1] "age_0_4"   "age_5_9"   "age_10_14" "age_15_19" "age_20_24" "age_25_29"
##  [7] "age_30_34" "age_35_39" "age_40_44" "age_45_49" "age_50_54" "age_55_59"
## [13] "age_60_64" "age_65_69" "age_70_74" "age_75_up"

We will retain only the latter the attributes:

stat = stat[vars]

Repjojecting

Again, we need to make sure both layers are in the same CRS:

stat = st_transform(stat, st_crs(nafot))

The resulting subset of the stat layer is shown in Figure 29.

plot(stat, max.plot = 16)
Population estimates in the `stat` layer

Figure 29: Population estimates in the stat layer

Figure 30 shows one of the attributes in stat with the nafot layer on top:

plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), border = "grey", add = TRUE)
The `age_10_14` attribute in `stat`, and the `nafot` layer

Figure 30: The age_10_14 attribute in stat, and the nafot layer

Fill missing data

One this we may notice in Figure 30 is that many of the statistical areas have NA values, meaning zero (rather than unknown) population size. For all practical purposes these should be replaced with “true” zero:

stat[is.na(stat)] = 0

The modification is demonstrated in Figure 31. Now we are ready to “transfer” the demographic estimates from the statistical area level, to the “Nafa” level.

plot(stat["age_10_14"], pal = hcl.colors(12, "Reds", rev = TRUE), border = "black", lwd = 0.07, reset = FALSE)
plot(st_geometry(nafot), border = "grey", add = TRUE)
The `age_10_14` attribute in `stat`, and the `nafot` layer

Figure 31: The age_10_14 attribute in stat, and the nafot layer

Standardizing geometries

x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
## Error in CPL_geos_op2(op, x, y): Evaluation error: ParseException: Unknown WKB type 12.
as.data.frame(table(st_geometry_type(stat)))
##                  Var1 Freq
## 1            GEOMETRY    0
## 2               POINT    0
## 3          LINESTRING    0
## 4             POLYGON    0
## 5          MULTIPOINT    0
## 6     MULTILINESTRING    0
## 7        MULTIPOLYGON 3068
## 8  GEOMETRYCOLLECTION    0
## 9      CIRCULARSTRING    0
## 10      COMPOUNDCURVE    0
## 11       CURVEPOLYGON    0
## 12         MULTICURVE    0
## 13       MULTISURFACE  126
## 14              CURVE    0
## 15            SURFACE    0
## 16  POLYHEDRALSURFACE    0
## 17                TIN    0
## 18           TRIANGLE    0
stat = st_cast(stat, "MULTIPOLYGON")
as.data.frame(table(st_geometry_type(stat)))
##                  Var1 Freq
## 1            GEOMETRY    0
## 2               POINT    0
## 3          LINESTRING    0
## 4             POLYGON    0
## 5          MULTIPOINT    0
## 6     MULTILINESTRING    0
## 7        MULTIPOLYGON 3194
## 8  GEOMETRYCOLLECTION    0
## 9      CIRCULARSTRING    0
## 10      COMPOUNDCURVE    0
## 11       CURVEPOLYGON    0
## 12         MULTICURVE    0
## 13       MULTISURFACE    0
## 14              CURVE    0
## 15            SURFACE    0
## 16  POLYHEDRALSURFACE    0
## 17                TIN    0
## 18           TRIANGLE    0

Fix topolopgy

x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
## Error in CPL_geos_op2(op, x, y): Evaluation error: TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 674902.54580551735 3528656.6016591629 at 674902.54580551735 3528656.6016591629.
stat = st_make_valid(stat)

Area-weighed interpolation

x = st_interpolate_aw(stat, nafot, extensive = TRUE)
## Warning in st_interpolate_aw.sf(stat, nafot, extensive = TRUE):
## st_interpolate_aw assumes attributes are constant over areas of x
x$Group.1 = NULL

The result…

plot(x, max.plot = 16)

Calculating proportions

dat = st_drop_geometry(x)
rownames(dat) = nafot$name_eng
dat[, 1:6]
##                age_0_4    age_5_9  age_10_14  age_15_19 age_20_24  age_25_29
## Ramla        34308.157  35480.099  32369.329  29095.881 24620.472  20846.904
## Tel Aviv    127130.244 109620.934  91668.354  82691.547 80979.798 100220.355
## Rehovot      52838.467  51748.292  45701.403  42056.490 37653.274  36262.685
## Haifa        44738.801  41258.817  36659.088  35189.068 37087.742  39010.933
## Petah Tiqwa  72960.980  72360.969  65454.855  56370.752 47684.046  42176.441
## Jerusalem   143014.908 127973.492 113769.180 103933.860 95599.193  83181.747
## Hadera       42729.350  42823.751  40581.731  39749.875 33351.647  30087.665
## Sharon       40842.886  42077.430  40111.430  36793.293 31939.422  28201.258
## Ashqelon     55514.217  50283.571  42664.732  39655.143 39107.549  36808.529
## Akko         56794.113  57174.737  57322.555  59915.198 52538.711  50274.451
## Yizre'el     48496.599  47099.450  45340.007  47387.870 42124.138  39068.472
## Be'er Sheva  81693.567  72531.221  63411.609  60628.745 56235.210  50365.623
## Golan         4541.896   4949.645   4886.759   5050.816  3791.931   3384.257
## Kinneret     10894.496  10116.174   9526.119   9484.625  8823.037   8893.782
## Zefat        12651.048  11674.798  10784.952  10414.150  9279.833   9640.825
dat = sweep(dat, 1, rowSums(dat), "/")
dat[, 1:6]
##                age_0_4    age_5_9  age_10_14  age_15_19  age_20_24  age_25_29
## Ramla       0.09756403 0.10089675 0.09205048 0.08274159 0.07001462 0.05928351
## Tel Aviv    0.08968093 0.07732941 0.06466520 0.05833273 0.05712522 0.07069800
## Rehovot     0.08780640 0.08599476 0.07594610 0.06988902 0.06257181 0.06026094
## Haifa       0.07696696 0.07098013 0.06306693 0.06053796 0.06380437 0.06711295
## Petah Tiqwa 0.09512865 0.09434633 0.08534194 0.07349783 0.06217185 0.05499087
## Jerusalem   0.12794951 0.11449258 0.10178457 0.09298532 0.08552864 0.07441926
## Hadera      0.09602274 0.09623488 0.09119654 0.08932717 0.07494887 0.06761395
## Sharon      0.08737008 0.09001098 0.08580536 0.07870729 0.06832401 0.06032742
## Ashqelon    0.10039783 0.09093817 0.07715945 0.07171659 0.07072626 0.06656847
## Akko        0.08797158 0.08856115 0.08879011 0.09280600 0.08138015 0.07787291
## Yizre'el    0.09384794 0.09114426 0.08773948 0.09170239 0.08151631 0.07560315
## Be'er Sheva 0.11874226 0.10542472 0.09216929 0.08812437 0.08173833 0.07320684
## Golan       0.09003600 0.09811900 0.09687238 0.10012454 0.07516912 0.06708762
## Kinneret    0.09559008 0.08876096 0.08358372 0.08321964 0.07741476 0.07803549
## Zefat       0.10276190 0.09483203 0.08760399 0.08459204 0.07537821 0.07831048

Hierarchical clustering

d = dist(dat)
hc = hclust(d, "average")
k = 3
groups = cutree(hc, k = k)
plot(hc)
rect.hclust(hc, k = k)

library(reshape2)
dat2 = dat
dat2$group = groups
dat2$name = rownames(dat2)
dat2 = melt(dat2, id.vars = c("group", "name"))
head(dat2)
##   group        name variable      value
## 1     1       Ramla  age_0_4 0.09756403
## 2     2    Tel Aviv  age_0_4 0.08968093
## 3     1     Rehovot  age_0_4 0.08780640
## 4     2       Haifa  age_0_4 0.07696696
## 5     1 Petah Tiqwa  age_0_4 0.09512865
## 6     3   Jerusalem  age_0_4 0.12794951
library(ggplot2)
ggplot(dat2, aes(x = variable, y = value, group = name)) +
    geom_line() +
    facet_wrap(~ group)

More information

half-size image half-size image half-size image half-size image

Other:

Thank you for listening!